Plotting dimension reductions

Dimension reductions can be plotted by function plot_scdata:

plot_scdata(scRNA_int, pal_setup = pal)
UMAP plotting, colored by clusters

UMAP plotting, colored by clusters

There are 3 optional arguments for plot_scdata: color_by, split_by, and pal_setup. As for the color_by argument, the function will color different "seurat_clusters" by default, and it can be changed to any factors in the metadata, like "sample" or "group":

plot_scdata(scRNA_int, color_by = "group", pal_setup = pal)
UMAP plotting, colored by groups

UMAP plotting, colored by groups

If split_by argument is specified as a factor in the metadata, the plotting will be split by that factor:

plot_scdata(scRNA_int, split_by = "sample", pal_setup = pal)
UMAP plotting, split by samples

UMAP plotting, split by samples

Similar to plot_qc function, the pal_setup argument will be useful if there is a specified palette setup to ensure consistent color scheme in different plotting functions.

Plotting statistics

The count and proportion statistics of clustering can be plotted by function plot_stat, the plot_type argument must be provided as one of the four values: "group_count", "cluster_count", "prop_fill", and "prop_multi". Their plots are shown below:

plot_stat(scRNA_int, plot_type = "group_count")

plot_stat(scRNA_int, plot_type = "cluster_count")

plot_stat(scRNA_int, plot_type = "prop_fill")

plot_stat(scRNA_int, plot_type = "prop_multi")

The group_by argument uses "sample" as the default grouping variable, and it can be specified as other factors in the metadata (e.g. "group"), when plot_type is group_count, prop_fill, or prop_multi.

plot_stat(scRNA_int, plot_type = "prop_fill", group_by = "group")

plot_stat(scRNA_int, plot_type = "prop_multi", group_by = "group")

Plotting heatmap

The plotting of heatmap requires cluster markers to be found by Seurat:

markers <- FindAllMarkers(scRNA_int, logfc.threshold = 0.1, min.pct = 0, only.pos = T)

Then, the top genes in each cluster are plotted by plot_heatmap. The default value of number of genes plotted in each cluster n is 8. In the heatmap, each row represents a gene and each column a cell. The cells can be sorted by sort_var can it is set to c('seurat_clusters', 'sample') by default, meaning the cells are first sorted by cluster identity and then sample. The bars above the heatmap are annotation bars and can show categorical or continuous variables in the metadata by specifying the anno_var argument. The anno_colors argument specifies the annotation colors for corresponding annotation variables hence it should be the same length as anno_var. It is recommended that proper color palettes are used for categorical and continuous variables. Currently, only RColorBrewer palettes are supported.

plot_heatmap(dataset = scRNA_int, 
              markers = markers,
              sort_var = c("seurat_clusters","sample"),
              anno_var = c("seurat_clusters", "sample","percent.mt","S.Score","G2M.Score"),
              anno_colors = c("Paired","Set2","Reds","Blues","Greens"))

GO Analysis

The GO analysis results can be plotted by plot_cluster_go and plot_all_cluster_go. The former plotted one specific cluster while the latter iterates all clusters. The topn argument in plot_cluster_go specifies the number of top genes for GO analysis and the default value is 100. The org argument specifies the organism, and "human" and "mouse" are the accepted values. plot_all_cluster_go is the wrapper for plot_cluster_go and the latter is again a wrapper for clusterProfilter::enrichGO. Hence, the ... arguments can be passed into inner functions.

plot_cluster_go(markers, cluster_name = '1', org = "human", ont = "CC")

plot_all_cluster_go(markers, org = 'human', ont = "CC")

Plotting Measures

The measures are defined as continuous variables in the metadata as well as gene expression values. The plot_measure and plot_measure_dim summarize these variables as box/violin plots and dimension reduction plots, respectively.

plot_measure(dataset = scRNA_int, 
             measures = c("KRT14","percent.mt"), 
             group_by = "seurat_clusters", 
             pal_setup = pal)

plot_measure_dim(dataset = scRNA_int, 
                 measures = c("nFeature_RNA","nCount_RNA","percent.mt","KRT14"))

plot_measure_dim(dataset = scRNA_int, 
                 measures = c("nFeature_RNA","nCount_RNA","percent.mt","KRT14"),
                 split_by = "sample")

GSEA Analysis

To perform GSEA analysis, we will first find the differentially expressed genes (DEGs) and related measures by find_diff_genes. Then, the ranked list will be input for GSEA analysis by test_GSEA. (Note: It may take Seurat a long time to find DEGs. Parallel processing by package future is recommended.)

de <- find_diff_genes(dataset = scRNA_int, 
                      clusters = as.character(0:6),
                      comparison = c("group", "CTCL", "Normal"),
                      logfc = 0)

gsea_res <- test_GSEA(de, 
                      pathway = pathways.hallmark)
plot_GSEA(gsea_res)